318 research outputs found
Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes
By expressing prior distributions as general stochastic processes,
nonparametric Bayesian methods provide a flexible way to incorporate prior
knowledge and constrain the latent structure in statistical inference. The
Indian buffet process (IBP) is such an example that can be used to define a
prior distribution on infinite binary features, where the exchangeability among
subjects is assumed. The phylogenetic Indian buffet process (pIBP), a
derivative of IBP, enables the modeling of non-exchangeability among subjects
through a stochastic process on a rooted tree, which is similar to that used in
phylogenetics, to describe relationships among the subjects. In this paper, we
study the theoretical properties of IBP and pIBP under a binary factor model.
We establish the posterior contraction rates for both IBP and pIBP and
substantiate the theoretical results through simulation studies. This is the
first work addressing the frequentist property of the posterior behaviors of
IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP
prior to a real data example arising in the field of cancer genomics where the
exchangeability among subjects is violated
Dynamic Graph Attention for Anomaly Detection in Heterogeneous Sensor Networks
In the era of digital transformation, systems monitored by the Industrial
Internet of Things (IIoTs) generate large amounts of Multivariate Time Series
(MTS) data through heterogeneous sensor networks. While this data facilitates
condition monitoring and anomaly detection, the increasing complexity and
interdependencies within the sensor network pose significant challenges for
anomaly detection. Despite progress in this field, much of the focus has been
on point anomalies and contextual anomalies, with lesser attention paid to
collective anomalies. A less addressed but common variant of collective
anomalies is when the abnormal collective behavior is caused by shifts in
interrelationships within the system. This can be due to abnormal environmental
conditions like overheating, improper operational settings resulting from
cyber-physical attacks, or system-level faults. To address these challenges,
this paper proposes DyGATAD (Dynamic Graph Attention for Anomaly Detection), a
graph-based anomaly detection framework that leverages the attention mechanism
to construct a continuous graph representation of multivariate time series by
inferring dynamic edges between time series. DyGATAD incorporates an operating
condition-aware reconstruction combined with a topology-based anomaly score,
thereby enhancing the detection ability of relationship shifts. We evaluate the
performance of DyGATAD using both a synthetic dataset with controlled varying
fault severity levels and an industrial-scale multiphase flow facility
benchmark featuring various fault types with different detection difficulties.
Our proposed approach demonstrated superior performance in collective anomaly
detection for sensor networks, showing particular strength in early-stage fault
detection, even in the case of faults with minimal severity.Comment: 15 pages, 7 figure
A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction
We present a new method for sentiment lex- icon induction that is designed to be appli- cable to the entire range of typological di- versity of the world’s languages. We eval- uate our method on Parallel Bible Corpus+ (PBC+), a parallel corpus of 1593 languages. The key idea is to use Byte Pair Encodings (BPEs) as basic units for multilingual em- beddings. Through zero-shot transfer from English sentiment, we learn a seed lexicon for each language in the domain of PBC+. Through domain adaptation, we then gener- alize the domain-specific lexicon to a general one. We show – across typologically diverse languages in PBC+ – good quality of seed and general-domain sentiment lexicons by intrin- sic and extrinsic and by automatic and human evaluation. We make freely available our code, seed sentiment lexicons for all 1593 languages and induced general-domain sentiment lexi- cons for 200 language
- …